Establishing evidenced-based best practice for the de novo assembly and evaluation of transcriptomes from non-model organisms
نویسنده
چکیده
Characterizing transcriptomes in both model and non-model organisms has resulted in a massive increase in 2 our understanding of biological phenomena. This boon, largely made possible via high-throughput 3 sequencing, means that studies of functional, evolutionary and population genomics are now being done by 4 hundreds or even thousands of labs around the world. For many, these studies begin with a de novo 5 transcriptome assembly, which is a technically complicated process involving several discrete steps. Each 6 step may be accomplished in one of several different ways, using different software packages, each producing 7 different results. This analytical complexity begs the question – Which method(s) are optimal? Using 8 reference and non-reference based evaluative methods, I propose a set of guidelines that aim to standardize 9 and facilitate the process of transcriptome assembly. These recommendations include the generation of 10 between 20 million and 40 million sequencing reads from single individual where possible, error correction of 11 reads, gentle quality trimming, assembly filtering using Transrate and/or gene expression, annotation using 12 dammit, and appropriate reporting. These recommendations have been extensively benchmarked and 13 applied to publicly available transcriptomes, resulting in improvements in both content and contiguity. To 14 facilitate the implementation of the proposed standardized methods, I have released a set of version 15 controlled open-sourced code, The Oyster River Protocol for Transcriptome Assembly, available at 16 http://oyster-river-protocol.rtfd.org/. 17
منابع مشابه
Clustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملA Comparison of Next Generation Sequencing Technologies for Transcriptome Assembly and Utility for RNA-Seq in a Non-Model Bird
De novo assembled transcriptomes, in combination with RNA-Seq, are powerful tools to explore gene sequence and expression level in organisms without reference genomes. Investigators must first choose which high throughput sequencing platforms will provide data most suitable for their experimental goals. In this study, we explore the utility of 454 and Illumina sequences for de novo transcriptom...
متن کاملComparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.
BACKGROUND De novo assembly of non-model organism's transcriptomes has recently been on the rise in concert with the number of de novo transcriptome assembly software programs. There is a knowledge gap as to what assembler software or k-mer strategy is best for construction of an optimal de novo assembly. Additionally, there is a lack of consensus on which evaluation metrics should be used to a...
متن کاملAugmenting transcriptome assembly by combining de novo and genome-guided tools
Researchers interested in studying and constructing transcriptomes, especially for non-model species, face the conundrum of choosing from a number of available de novo and genome-guided assemblers. None of the popular assembly tools in use today achieve requisite sensitivity, specificity or recovery of full-length transcripts on their own. Here, we present a comprehensive comparative study of t...
متن کاملEvaluation of de novo assembly technique in the South African abalone Haliotis midae transcriptome: A comparison from Illumina and 454 systems
Next generation sequencing platforms have recently been used to rapidly characterize transcriptome sequences from a number of non-model organisms. The present study compares two of the most frequently used platforms, the Roche 454-pyrosequencing and the Illumina sequencing-by-synthesis (SBS), on the same RNA sample obtained from an intertidal gastropod mollusc species, Haliotis midae. All the s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016